-
Notifications
You must be signed in to change notification settings - Fork 16
Fix linalg.matmul_transpose_b for big tiles
#410
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: dchigarev <[email protected]>
| if (transpose) { | ||
| std::swap(newRowOffs, newColOffs); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
placing std::swap inside a nested for-loop was a bad idea since it swaps the values each iteration producing non-sense offsets at the end
Signed-off-by: dchigarev <[email protected]>
| %subview_1 = memref.subview %arg0[%arg3, 0] [32, 1024] [1, 1] : memref<1024x1024xf16> to memref<32x1024xf16, strided<[1024, 1], offset: ?>> | ||
| %subview_2 = memref.subview %arg1[%arg4, 0] [32, 1024] [1, 1] : memref<1024x1024xf16> to memref<32x1024xf16, strided<[1024, 1], offset: ?>> | ||
| linalg.matmul_transpose_b ins(%subview_1, %subview_2 : memref<32x1024xf16, strided<[1024, 1], offset: ?>>, memref<32x1024xf16, strided<[1024, 1], offset: ?>>) outs(%subview_0 : memref<32x32xf16, strided<[1024, 1], offset: ?>>) | ||
| scf.parallel (%arg3, %arg4) = (%c0, %c0) to (%c1024, %c1024) step (%c16, %c64) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
increased the tile size for Y axis to test the problematic case
| // CHECK: %[[tB:.+]] = xegpu.update_nd_offset %[[rootB]], [%c0, %c0] | ||
| // CHECK: %[[tB1:.+]] = xegpu.update_nd_offset %[[rootB]], [%c16, %c0] | ||
| // CHECK: %[[tB2:.+]] = xegpu.update_nd_offset %[[rootB]], [%c32, %c0] | ||
| // CHECK: %[[tB3:.+]] = xegpu.update_nd_offset %[[rootB]], [%c48, %c0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it used to do something like:
xegpu.update_nd_offset %[[rootB]], [%c0, %c0]
xegpu.update_nd_offset %[[rootB]], [%c16, %c0]
xegpu.update_nd_offset %[[rootB]], [%c32, %c16]
xegpu.update_nd_offset %[[rootB]], [%c16, %c32]
It was discovered that
linalg.matmul_transpose_bis lowered to xegpu incorrectly in the case of large tiles. The issue is caused by astd::swapplaced inside a nested for-loop, which, instead of swappingrowOffsandcolOffsonly once, performs the swap in every iteration, resulting in incorrect offsets.